Skip to content

WTF: read interrupted SP from ucontext in signalHandlerSuspendResume#235

Open
robobun wants to merge 1 commit into
mainfrom
farm/ebd23fe3/sigpwr-ucontext-sp
Open

WTF: read interrupted SP from ucontext in signalHandlerSuspendResume#235
robobun wants to merge 1 commit into
mainfrom
farm/ebd23fe3/sigpwr-ucontext-sp

Conversation

@robobun

@robobun robobun commented May 21, 2026

Copy link
Copy Markdown
Collaborator

Fixes the SIGPWR suspend-loop deadlock reported in oven-sh/bun#31158 (and oven-sh/bun#29843 — same mechanism via Prisma's WASM-backed adapter).

The bug

Thread::suspend() sends the GC suspend-resume signal to the target thread and then spins:

while (true) {
    pthread_kill(m_handle, g_wtfConfig.sigThreadSuspendResume);
    globalSemaphoreForSuspendResume->wait();
    if (m_platformRegisters) break;
    Thread::yield();
}

The handler decides whether to publish a register snapshot by checking currentStackPointer() — the SP of the handler's own frame — against thread->m_stack:

void* approximateStackPointer = currentStackPointer();
if (!thread->m_stack.contains(approximateStackPointer)) {
    thread->m_platformRegisters = nullptr;
    globalSemaphoreForSuspendResume->post();
    return;
}

That only works so long as the handler runs on the normal stack. If SA_ONSTACK is set on the sigaction and the thread has a sigaltstack installed, the handler runs on the alt stack, the check fails on every retry, and the loop spins forever.

Who trips this

Go's cgo runtime. runtime.initsig walks every signal and, for any handler it didn't install itself, calls setsigstack to force-add SA_ONSTACK so Go's own threads' synchronous faults stay on a managed alt stack. That includes our SIGPWR handler. Any c-shared library that follows the same recipe (Mono, some JNI layouts, Rust cdylibs) hits it too.

Combined with sanitizer runtimes / libbacktrace / the host's crash handler installing an alternate signal stack on the main thread (ASAN does this unconditionally), the next GC thread-suspend delivers SIGPWR onto the alt stack → stack check fails every retry → the suspender wedges at 100% CPU.

Reproducible in a plain C program to confirm currentStackPointer vs the ucontext-saved SP:

main stack near: 0x7ffdea86460c
alt stack range: [0x74fca3657010..0x74fca3697010]
Case 1 (no SA_ONSTACK): handler_sp=0x7ffdea8638a4 interrupted_sp=0x7ffdea8645a0
Case 2 (SA_ONSTACK):    handler_sp=0x74fca36963a4 interrupted_sp=0x7ffdea8645a0

The fix

Read the SP the thread was running at when the signal arrived straight out of the ucontext (kernel-saved register state). That SP is stable regardless of whether the handler itself runs on the normal or the alt stack, and the existing retry-on-alt-stack semantics still work: if the thread genuinely was on an alt stack when the signal arrived (e.g. a nested handler), the ucontext SP reflects that and the check backs off as before.

currentStackPointer() stays as the fallback for non-HAVE(MACHINE_CONTEXT) platforms.

Test

Regression test coming from the Bun side (oven-sh/bun#31161) — spawns a child that sets SA_ONSTACK on the suspend-resume signal the same way Go's initsig does, then drives the WASM install path that triggers resetInstructionCacheOnAllThreads. Passes with this fix applied; hangs to the test-runner timeout without it.

@gogakoreli

Copy link
Copy Markdown

Small suggestion: now that the handler is robust to SA_ONSTACK via ucontext SP, the comment on the SIGPWR override (from ceb3e74) should be updated to explain the full picture. A reader seeing #if OS(LINUX) && USE(BUN_JSC_ADDITIONS) + SIGPWR with no context about why it's Linux-only or how it relates to the ucontext fix will be confused.

Suggested replacement:

#if OS(LINUX) && USE(BUN_JSC_ADDITIONS)
    // Thread suspension on Linux uses a signal (macOS uses Mach ports instead —
    // this entire signal-based mechanism is Linux/FreeBSD-only). We override the
    // default SIGUSR1 to SIGPWR because npm packages commonly register
    // process.on('SIGUSR1') handlers. SIGPWR ("power failure") is effectively
    // unused on modern Linux.
    //
    // Note: dlopen'd runtimes (Go cgo, Mono, JNI) may add SA_ONSTACK to this
    // handler's sigaction flags via their init routines. The handler reads the
    // interrupted SP from ucontext rather than its own frame pointer, making it
    // robust to alt-stack delivery regardless of flag mutation.
    // See oven-sh/bun#31158.
    g_wtfConfig.sigThreadSuspendResume = SIGPWR;
#endif

This ties together the "why SIGPWR", "why Linux-only", and "why it's safe" in one place.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a6bec3cb-d7a0-415b-98e0-c537f66d074f

📥 Commits

Reviewing files that changed from the base of the PR and between a355e3e0cc5b7d4d23b9a8195368bf153592a61a and 3593e2c.

📒 Files selected for processing (1)
  • Source/WTF/wtf/posix/ThreadingPOSIX.cpp

Walkthrough

The signal-handler suspend/resume mechanism is improved to detect the interrupted thread's stack pointer from machine-context ucontext_t register state instead of runtime approximation. A new interruptedStackPointer() helper extracts the stack pointer on supported architectures, and signalHandlerSuspendResume uses it to validate whether the handler interrupted normal or alternate stack execution, gracefully backing off if the interrupted execution occurred outside the normal stack bounds.

Changes

Signal handler stack pointer reliability

Layer / File(s) Summary
Machine-context stack pointer extraction
Source/WTF/wtf/posix/ThreadingPOSIX.cpp
New interruptedStackPointer(ucontext_t*) helper (under HAVE(MACHINE_CONTEXT)) reads the interrupted stack pointer directly from kernel-saved ucontext_t register fields for multiple Linux and FreeBSD CPU architectures, with fallback to nullptr on unsupported configurations.
Signal handler suspend/resume with reliable stack detection
Source/WTF/wtf/posix/ThreadingPOSIX.cpp
Thread::signalHandlerSuspendResume now uses interruptedStackPointer() to obtain stackPointerToCheck (with fallback to currentStackPointer()), validates it against the thread's recorded stack bounds, and either captures m_platformRegisters from ucontext or clears them and backs off for retry based on containment. Initialization comment updated to document SA_ONSTACK behavior in dynamically loaded runtimes and the role of ucontext-based stack pointer reading in handler correctness.

Possibly related issues

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: reading the interrupted stack pointer from ucontext in the signalHandlerSuspendResume function, which is the core fix for the deadlock issue.
Description check ✅ Passed The description is comprehensive and well-structured, covering the bug mechanism, root cause, fix details, affected users, test approach, and relevant issue references. It exceeds the template requirements by providing clear technical context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Preview Builds

Commit Release Date
3593e2cc autobuild-preview-pr-235-3593e2cc 2026-06-17 15:03:21 UTC
66e7173c autobuild-preview-pr-235-66e7173c 2026-06-16 23:41:58 UTC
e478893a autobuild-preview-pr-235-e478893a 2026-06-03 00:17:26 UTC
0eedb192 autobuild-preview-pr-235-0eedb192 2026-05-26 14:57:36 UTC
925c056e autobuild-preview-pr-235-925c056e 2026-05-25 11:47:13 UTC
f9079851 autobuild-preview-pr-235-f9079851 2026-05-23 04:56:47 UTC
a355e3e0 autobuild-preview-pr-235-a355e3e0 2026-05-21 04:48:50 UTC

robobun added a commit to oven-sh/bun that referenced this pull request May 21, 2026
Point WEBKIT_VERSION at the preview autobuild of oven-sh/WebKit#235,
which teaches signalHandlerSuspendResume to read the interrupted
thread's SP from the ucontext instead of the handler's own
currentStackPointer(). That's the root-cause fix — regardless of
what any dlopen'd library (Go cgo, Mono, JNI, Rust cdylib, …) does
to SA_ONSTACK, the stack-range check always sees the interrupted
stack rather than wherever the handler landed.

Delete the per-dlopen sigaction scan on the Bun side. With the WTF
change in place it's pure overhead, and the reporter rightly pointed
out that chaining workarounds for workarounds is the wrong direction.

Regression test (test/regression/issue/31158.test.ts) is unchanged
and now exercises the upstream fix directly.

Swap the preview-pr-235 tag for the merged autobuild hash once
oven-sh/WebKit#235 lands.
@robobun robobun force-pushed the farm/ebd23fe3/sigpwr-ucontext-sp branch 3 times, most recently from 925c056 to 0eedb19 Compare May 26, 2026 12:54
@robobun robobun force-pushed the farm/ebd23fe3/sigpwr-ucontext-sp branch from 0eedb19 to e478893 Compare June 2, 2026 23:37
@robobun robobun force-pushed the farm/ebd23fe3/sigpwr-ucontext-sp branch 2 times, most recently from 71211c9 to 71e0011 Compare June 16, 2026 22:42
Comment thread Source/WTF/wtf/posix/ThreadingPOSIX.cpp
@robobun robobun force-pushed the farm/ebd23fe3/sigpwr-ucontext-sp branch from 71e0011 to 66e7173 Compare June 16, 2026 23:05
robobun added a commit to oven-sh/bun that referenced this pull request Jun 16, 2026
…e7173c)

Rebase of the signalHandlerSuspendResume ucontext-SP fix onto the
current WebKit main pin 09f04cd5, plus a cross-reference comment
tying interruptedStackPointer() to JSC::MachineContext::stackPointerImpl
so a future arch addition updates both copies.
signalHandlerSuspendResume validated the snapshot by calling
currentStackPointer(), the SP of the handler frame itself. That
assumes the handler always runs on the thread's normal stack, which
fails when SA_ONSTACK is set on our sigaction and the thread has a
sigaltstack installed (Go's cgo initsig does this to inherited
handlers; combined with ASAN/crash-handler alt stacks the check fails
on every retry and Thread::suspend() spins forever). Read the
interrupted SP from the ucontext instead, which is stable regardless
of which stack the handler runs on. See oven-sh/bun#31158, oven-sh/bun#29843.
@robobun robobun force-pushed the farm/ebd23fe3/sigpwr-ucontext-sp branch from 66e7173 to 3593e2c Compare June 17, 2026 14:26
Comment on lines +244 to +245
// stays correct because it reads the interrupted stack pointer from the ucontext rather
// than its own frame pointer. See oven-sh/bun#31158.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The comment says the handler reads the interrupted SP from ucontext "rather than its own frame pointer", but the pre-fix code called currentStackPointer(), which reads the stack pointer (SP/RSP), not the frame pointer (FP/RBP). The parallel comment inside signalHandlerSuspendResume correctly says "the handler's own SP"; suggest matching that here: "…rather than the handler's own stack pointer."

Extended reasoning...

What the comment says vs. what the code did

The new comment in initializePlatformThreading() reads:

signalHandlerSuspendResume stays correct because it reads the interrupted stack pointer from the ucontext rather than its own frame pointer.

The contrast being drawn is between the new behaviour (read the interrupted thread's SP from ucontext->uc_mcontext) and the old behaviour. But the old behaviour was a call to currentStackPointer() — defined in wtf/StackPointer.h — which returns the architectural stack pointer (RSP on x86_64, sp on arm64, etc.). It does not read the frame pointer (RBP / FP). So the comment names the wrong register.

Why the distinction matters here

In most prose, "frame pointer" used loosely for "a pointer into the current frame" would be harmless. But this is a file that has just added interruptedStackPointer(), which literally pulls gregs[REG_RSP] / .sp / .arm_sp / mc_rsp out of mcontext_t, and the whole PR is about distinguishing which SP gets checked. In that context, SP vs. FP is not loose terminology — a reader familiar with the difference may pause and wonder whether the old code did FP-based introspection (it didn't), or whether there's some FP-vs-SP subtlety in the alt-stack case (there isn't).

There's also an internal inconsistency: the parallel comment inside signalHandlerSuspendResume itself, added in the same diff, gets it right — "Fall back to the handler's own SP on platforms where we don't have a machine context." The two comments describe the same fallback and should use the same term.

Addressing the refutation

One verifier argued this is below the bar because the intent is clear from context and the wording came from a reviewer's suggested comment. Both points are true — no one will misread the mechanism. But the fix is a one-word change in a comment that is specifically about register-level behaviour, and the same PR already uses the correct term twenty lines up. Aligning the two costs nothing and removes a small "wait, frame pointer?" stumble for future readers. That's squarely nit territory, and it's filed as such.

Step-by-step

  1. Pre-fix code: void* approximateStackPointer = currentStackPointer(); — reads SP of the handler's current frame.
  2. Post-fix code: reads interruptedStackPointer(userContext) from ucontext, falling back to currentStackPointer().
  3. The comment at line 245 describes (2) as "reads the interrupted stack pointer from the ucontext rather than its own frame pointer".
  4. Neither (1) nor the fallback in (2) ever touched the frame pointer; both read the stack pointer.
  5. Therefore "frame pointer" → "stack pointer" (or "the handler's own stack pointer", matching the in-handler comment).

Impact

None at runtime — comment-only. Documentation accuracy in low-level register-handling code.

Fix

-        // stays correct because it reads the interrupted stack pointer from the ucontext rather
-        // than its own frame pointer. See oven-sh/bun#31158.
+        // stays correct because it reads the interrupted stack pointer from the ucontext rather
+        // than the handler's own stack pointer. See oven-sh/bun#31158.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants